NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Preference Learning Algorithms Do Not Learn Preference Rankings

Chen, Angelica; Malladi, Sadhika; Zhang, Lily H; Chen, Xinyi; Zhang, Qiuyi; Ranganath, Rajesh; Cho, Kyunghyun (October 2024, 2024 Conference on Neural Information Processing Systems)

Preference learning algorithms (e.g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited. In this work, we study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs, measured via ranking accuracy. Surprisingly, we find that most state-of-the-art preference-tuned models achieve a ranking accuracy of less than 60% on common preference datasets. We furthermore derive the idealized ranking accuracy that a preference-tuned LLM would achieve if it optimized the DPO or RLHF objective perfectly. We demonstrate that existing models exhibit a significant alignment gap -- i.e., a gap between the observed and idealized ranking accuracies. We attribute this discrepancy to the DPO objective, which is empirically and theoretically ill-suited to fix even mild ranking errors in the reference model, and derive a simple and efficient formula for quantifying the difficulty of learning a given preference datapoint. Finally, we demonstrate that ranking accuracy strongly correlates with the empirically popular win rate metric when the model is close to the reference model used in the objective, shedding further light on the differences between on-policy (e.g., RLHF) and off-policy (e.g., DPO) preference learning algorithms.
more » « less
Full Text Available
Inducing Microstrain in Electrodeposited Pt through Polymer Addition for Highly Active Oxygen Reduction Catalysis

https://doi.org/10.1021/acscatal.4c01244

Hua, Qi; Chen, Xinyi; Chen, Junfeng; Alghoraibi, Nawal M; Lee, Yoon; Woods, Toby J; Haasch, Richard T; Zimmerman, Steven C; Gewirth, Andrew A (May 2024, ACS Catalysis)

Full Text Available
Cowbird: Freeing CPUs to Compute by Offloading the Disaggregation of Memory

https://doi.org/10.1145/3603269.3604833

Chen, Xinyi; Yu, Liangcheng; Liu, Vincent; Zhang, Qizhen (September 2023, ACM)
Stable expression of large transgenes via the knock-in of an integrase-deficient lentivirus

https://doi.org/10.1038/s41551-023-01037-x

Chavez, Michael; Rane, Draven A.; Chen, Xinyi; Qi, Lei S. (May 2023, Nature Biomedical Engineering)

The targeted insertion and stable expression of a large genetic payload in primary human cells demands methods that are robust, efficient and easy to implement. Large payload insertion via retroviruses is typically semi-random and hindered by transgene silencing. Leveraging homology-directed repair to place payloads under the control of endogenous essential genes can overcome silencing but often results in low knock-in efficiencies and cytotoxicity. Here we report a method for the knock-in and stable expression of a large payload and for the simultaneous knock-in of two genes at two endogenous loci. The method, which we named CLIP (for 'CRISPR for long-fragment integration via pseudovirus'), leverages an integrase-deficient lentivirus encoding a payload flanked by homology arms and 'cut sites' to insert the payload upstream and in-frame of an endogenous essential gene, followed by the delivery of a CRISPR-associated ribonucleoprotein complex via electroporation. We show that CLIP enables the efficient insertion and stable expression of large payloads and of two difficult-to-express viral antigens in primary T cells at low cytotoxicity. CLIP offers a scalable and efficient method for manufacturing engineered primary cells.
more » « less
Full Text Available
The chromatin organization of a chlorarachniophyte nucleomorph genome

https://doi.org/10.1186/s13059-022-02639-5

Marinov, Georgi K.; Chen, Xinyi; Wu, Tong; He, Chuan; Grossman, Arthur R.; Kundaje, Anshul; Greenleaf, William James (December 2022, Genome Biology)

Abstract Background Nucleomorphs are remnants of secondary endosymbiotic events between two eukaryote cells wherein the endosymbiont has retained its eukaryotic nucleus. Nucleomorphs have evolved at least twice independently, in chlorarachniophytes and cryptophytes, yet they have converged on a remarkably similar genomic architecture, characterized by the most extreme compression and miniaturization among all known eukaryotic genomes. Previous computational studies have suggested that nucleomorph chromatin likely exhibits a number of divergent features. Results In this work, we provide the first maps of open chromatin, active transcription, and three-dimensional organization for the nucleomorph genome of the chlorarachniophyte Bigelowiella natans . We find that the B. natans nucleomorph genome exists in a highly accessible state, akin to that of ribosomal DNA in some other eukaryotes, and that it is highly transcribed over its entire length, with few signs of polymerase pausing at transcription start sites (TSSs). At the same time, most nucleomorph TSSs show very strong nucleosome positioning. Chromosome conformation (Hi-C) maps reveal that nucleomorph chromosomes interact with one other at their telomeric regions and show the relative contact frequencies between the multiple genomic compartments of distinct origin that B. natans cells contain. Conclusions We provide the first study of a nucleomorph genome using modern functional genomic tools, and derive numerous novel insights into the physical and functional organization of these unique genomes.
more » « less
Full Text Available
Effect of Support on Oxygen Reduction Reaction Activity of Supported Iron Porphyrins

https://doi.org/10.1021/acscatal.1c04871

Hua, Qi; Madsen, Kenneth E.; Esposito, Anne Marie; Chen, Xinyi; Woods, Toby J.; Haasch, Richard T.; Xiang, Shuting; Frenkel, Anatoly I.; Fister, Timothy T.; Gewirth, Andrew A. (January 2022, ACS Catalysis)

Full Text Available
Understanding the effect of data center resource disaggregation on production DBMSs

https://doi.org/10.14778/3397230.3397249

Zhang, Qizhen; Cai, Yifan; Chen, Xinyi; Angel, Sebastian; Chen, Ang; Liu, Vincent; Loo, Boon Thau (May 2020, Proceedings of the VLDB Endowment)
null (Ed.)
Resource disaggregation is a new architecture for data centers in which resources like memory and storage are decoupled from the CPU, managed independently, and connected through a high-speed network. Recent work has shown that although disaggregated data centers (DDCs) provide operational benefits, applications running on DDCs experience degraded performance due to extra network latency between the CPU and their working sets in main memory. DBMSs are an interesting case study for DDCs for two main reasons: (1) DBMSs normally process data-intensive workloads and require data movement between different resource components; and (2) disaggregation drastically changes the assumption that DBMSs can rely on their own internal resource management. We take the first step to thoroughly evaluate the query execution performance of production DBMSs in disaggregated data centers. We evaluate two popular open-source DBMSs (MonetDB and PostgreSQL) and test their performance with the TPC-H benchmark in a recently released operating system for resource disaggregation. We evaluate these DBMSs with various configurations and compare their performance with that of single-machine Linux with the same hardware resources. Our results confirm that significant performance degradation does occur, but, perhaps surprisingly, we also find settings in which the degradation is minor or where DDCs actually improve performance.
more » « less
Full Text Available
Calibration, Entropy Rates, and Memory in Language Models

Braverman, Mark; Chen, Xinyi; Kakade, Sham; Narasimhan, Karthik; Zhang, Cyril; Zhang, Yi (January 2020, Proceedings of the 37th International Conference on Machine Learning (ICML))
null (Ed.)
Full Text Available
Potential-Dependent Layering in the Electrochemical Double Layer of Water-in-Salt Electrolytes

https://doi.org/10.1021/acsaem.0c01534

Zhang, Ruixian; Han, Mengwei; Ta, Kim; Madsen, Kenneth E.; Chen, Xinyi; Zhang, Xueyong; Espinosa-Marzal, Rosa M.; Gewirth, Andrew A. (August 2020, ACS Applied Energy Materials)
null (Ed.)
Full Text Available
Optimizing ordered graph algorithms with GraphIt

https://doi.org/10.1145/3368826.3377909

Zhang, Yunming; Brahmakshatriya, Ajay; Chen, Xinyi; Dhulipala, Laxman; Kamil, Shoaib; Amarasinghe, Saman; Shun, Julian (February 2020, roceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization (CGO ’20))

Full Text Available

« Prev Next »

Search for: All records